A Graph-based Approach for Dynamic Clustering
نویسندگان
چکیده
Clustering algorithms consist in automatically discovering structure of large data set or providing coherent groups of objects independent of any user-defined classes. They are used in many domains: astronomy, information retrieval, image segmentation, biological applications and so on. Several clustering techniques have been proposed [1], there are either hierarchical techniques or partitioning techniques. We have recently proposed a new partitioning clustering technique based on the b-coloring of graph [2]. This technique consists in coloring vertices with the maximum number of colors such that (i) no two adjacent vertices (vertices joined by an weighted edge representing the dissimilarity between objects) have the same color (proper coloring), and (ii) for each color c, there exist at least one vertex with this color which is adjacent (has a sufficient dissimilarity degree) to all other colors. This vertex is called dominating vertex, there can have many within the same class. This specific vertex reflects the properties of the class and also guarantees that the class has a distinct separation from all other classes of the partitioning. The b-coloring based clustering method in [2] enables to build a fine partition of the data set (numeric or symbolic) in clusters when the number of clusters is not specified in advance. Motivated by applications such as Web content management and information retrieval requiring a high rate of data set update, the current paper deals with the problem of dynamic clustering. It consists in updating clusters without having to frequently performing complete re-clustering. In the following, an online (or incremental) algorithm is proposed for the b-coloring technique presented in [2]. It relies solely on the knowledge of the dissimilarity matrix between instances and the cluster dominating vertices. The difference between this learning b-coloring approach and the traditional one [2] in particular is the ability to process new data as they are added to the data collection, eventually with an updating of existing clusters.
منابع مشابه
A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملAssessment of Clustering Methods for Predicting Permeability in a Heterogeneous Carbonate Reservoir
Permeability, the ability of rocks to flow hydrocarbons, is directly determined from core. Due to high cost associated with coring, many techniques have been suggested to predict permeability from the easy-to-obtain and frequent properties of reservoirs such as log derived porosity. This study was carried out to put clustering methods (dynamic clustering (DC), ascending hierarchical clustering ...
متن کاملSampling from social networks’s graph based on topological properties and bee colony algorithm
In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملClustering and Memory-based Parent-Child Swarm Meta-heuristic Algorithm for Dynamic Optimization
So far, various optimization methods have been proposed, and swarm intelligence algorithms have gathered a lot of attention by academia. However, most of the recent optimization problems in the real world have a dynamic nature. Thus, an optimization algorithm is required to solve the problems in dynamic environments well. In this paper, a novel collective optimization algorithm, namely the Clus...
متن کامل